Include Python dependencies in README by simonw · Pull Request #6 · ggml-org/llama.cpp

simonw · 2023-03-11T04:26:54Z

Should maybe note that you need Python 3.10 - because there's no torch wheel yet for Python 3.11.

simonw · 2023-03-11T04:27:05Z

See also https://til.simonwillison.net/llms/llama-7b-m2

This is documented in the LLaMA README now: - ggml-org/llama.cpp#6

Buffer incomplete multibyte characters + other stuff.

fix compilation errors with llvm

…gml-org#6) * deprecate ffn_b * get tensor offloading levels * wip: split tensor loading * wip: framework of loading sparse model tensors * save and flush gpu alloc buffer * vram budget will fall back to remaining free memory * minor: remove vram safety margin * add options for vram budget; clean old env vars * minor: bugfix

* Update demo video in README.md * Update demo at README.md

Fix model architecture name

…-org#6) * Example using ET-Soc-1 emulator configuration Example usage: ```bash cmake -B build -DGGML_CUDA=OFF -DGGML_ET=ON -DLLAMA_CURL=OFF -DGGML_CCACHE=ON cmake --build build --config Release -j $(nproc) time ./build/bin/test-backend-ops ./build/bin/llama-server \ --model Qwen3-0.6B-Q8_0.gguf \ --alias Qwen3-0.6B-Q8_0 \ -fa 0 \ --ctx-size 1024 \ --no-warmup \ --host 127.0.0.1 \ --port 8080 ```

* softmax fp16 impl * oai moe * compat with new checkpoint * add attn sink impl * add rope scaling yarn * logits match with latest transformers code * wip chat template * rm trailing space * use ggml_scale_bias * rm redundant is_swa_all * convert interleaved gate_up * graph : fix activation function to match reference (#7) * vocab : handle o200k_harmony special tokens * ggml : add attention sinks support (#1) * llama : add attn sinks * ggml : add attn sinks * cuda : add attn sinks * vulkan : add support for sinks in softmax remove unnecessary return * ggml : add fused swiglu_oai op (#11) * ggml : add fused swiglu_oai op * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <[email protected]> * update CUDA impl * cont : metal impl * add vulkan impl * test-backend-ops : more test cases, clean up * llama : remove unfused impl * remove extra lines --------- Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: slaren <[email protected]> * repack mxfp4 upon conversion * clean up a bit * enable thinking * add quick hack to render only some special tokens * fix bf16 conversion * remove vocab hack * webui ok * support chat parsing for gpt-oss * fix webui * direct mapping mxfp4, FINALLY * force using mxfp4 * properly use lazy tensor * ggml : add mxfp4 ggml : use e8m0 conversion instead of powf Co-authored-by: Diego Devesa <[email protected]> change kvalues_mxfp4 table to match e2m1 (#6) metal : remove quantization for now (not used) cuda : fix disabled CUDA graphs due to ffn moe bias vulkan : add support for mxfp4 cont : add cm2 dequant * ggml : add ggml_add_id (#13) * ggml : add ggml_add_id * add cuda impl * llama : add weight support check for add_id * perf opt * add vulkan impl * rename cuda files * add metal impl * allow in-place ggml_add_id * llama : keep biases on CPU with --cpu-moe * llama : fix compile error ggml-ci * cuda : add fallback for __nv_cvt_e8m0_to_bf16raw ggml-ci * cleanup ggml-ci * sycl : fix supports_op for MXFP4 ggml-ci * fix Unknown reasoning format * ggml-cpu : fix AVX build ggml-ci * fix hip build ggml-ci * cuda : add mxfp4 dequantization support for cuBLAS ggml-ci * ggml-cpu : fix mxfp4 fallback definitions for some architectures ggml-ci * cuda : fix version required for __nv_cvt_e8m0_to_bf16raw --------- Co-authored-by: Xuan Son Nguyen <[email protected]> Co-authored-by: slaren <[email protected]>

…16038) Initalizing RESERVED_NAME in is_reserved_name() is not thread safe and leads to corrupted memory when used from multiple threads as can be seen in the asan trace below. This fixes the initialization to make it thread-safe. #0 0x000100abd018 in std::__1::pair<std::__1::__hash_iterator<std::__1::__hash_node<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, void*>*>, bool> std::__1::__hash_table<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>::__emplace_unique_key_args<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) __hash_table:1565 #1 0x000100ab0320 in SchemaConverter::visit(nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) json-schema-to-grammar.cpp:802 #2 0x000100aafc48 in std::__1::__function::__func<build_grammar(std::__1::function<void (common_grammar_builder const&)> const&, common_grammar_options const&)::$_2, std::__1::allocator<build_grammar(std::__1::function<void (common_grammar_builder const&)> const&, common_grammar_options const&)::$_2>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&)>::operator()(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&) function.h:319 #3 0x000100a2c938 in std::__1::__function::__func<common_chat_params_init_llama_3_x(minja::chat_template const&, templates_params const&, bool)::$_0::operator()(common_grammar_builder const&) const::'lambda'(nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&), std::__1::allocator<common_chat_params_init_llama_3_x(minja::chat_template const&, templates_params const&, bool)::$_0::operator()(common_grammar_builder const&) const::'lambda'(nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&)>, void (nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&)>::operator()(nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&) function.h:319 #4 0x000100a139f8 in foreach_function(nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&, std::__1::function<void (nlohmann::json_abi_v3_12_0::basic_json<nlohmann::json_abi_v3_12_0::ordered_map, std::__1::vector, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, bool, long long, unsigned long long, double, std::__1::allocator, nlohmann::json_abi_v3_12_0::adl_serializer, std::__1::vector<unsigned char, std::__1::allocator<unsigned char>>, void> const&)> const&) chat.cpp:762 #5 0x000100a2a7f4 in std::__1::__function::__func<common_chat_params_init_llama_3_x(minja::chat_template const&, templates_params const&, bool)::$_0, std::__1::allocator<common_chat_params_init_llama_3_x(minja::chat_template const&, templates_params const&, bool)::$_0>, void (common_grammar_builder const&)>::operator()(common_grammar_builder const&) function.h:319 #6 0x000100aa98f4 in build_grammar(std::__1::function<void (common_grammar_builder const&)> const&, common_grammar_options const&) json-schema-to-grammar.cpp:982 #7 0x0001009c9314 in common_chat_params_init_llama_3_x(minja::chat_template const&, templates_params const&, bool) chat.cpp:1110 #8 0x0001009b8afc in common_chat_templates_apply_jinja(common_chat_templates const*, common_chat_templates_inputs const&) chat.cpp:1992 #9 0x0001009b533c in common_chat_templates_apply(common_chat_templates const*, common_chat_templates_inputs const&) chat.cpp:2074 #10 0x000100810120 in llamacpp_apply_chat_template+0x724 (predict_oai-98384e17fb94e863:arm64+0x100090120) ... ==45482==Register values: x[0] = 0x00006020004147f8 x[1] = 0x00006080000013c8 x[2] = 0x0000000000000000 x[3] = 0x0000604006289738 x[4] = 0x0000000000000002 x[5] = 0x0000000000000001 x[6] = 0x04034000004b4000 x[7] = 0x0000000000000001 x[8] = 0xbebebebebebebebe x[9] = 0x17d7d7d7d7d7d7d7 x[10] = 0x00000c04000828ff x[11] = 0x0000000000000001 x[12] = 0x000000002018d383 x[13] = 0x0000000000000000 x[14] = 0xfa0000000000fafa x[15] = 0x000010700001ffff x[16] = 0x000000019dc012c0 x[17] = 0x00000001021284f8 x[18] = 0x0000000000000000 x[19] = 0x00000001700acdc0 x[20] = 0x0000000000000002 x[21] = 0x000000002018d384 x[22] = 0x16dd16fd2e731151 x[23] = 0x0000007000020000 x[24] = 0x0000000100c69c08 x[25] = 0x0000000100c69c20 x[26] = 0x00006080000013c7 x[27] = 0x0000000100c69c00 x[28] = 0x00000001700acd60 fp = 0x00000001700aceb0 lr = 0x0000000100abce30 sp = 0x00000001700acd60 AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV __hash_table:1565 in std::__1::pair<std::__1::__hash_iterator<std::__1::__hash_node<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, void*>*>, bool> std::__1::__hash_table<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>>::__emplace_unique_key_args<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) Thread T5 created by T0 here: #0 0x0001020b99d4 in pthread_create+0x5c (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x359d4) #1 0x000100873910 in std::sys::pal::unix::thread::Thread::new::h77254fdd87a28e05+0x118 (predict_oai-98384e17fb94e863:arm64+0x1000f3910) #2 0x0001007c7a1c in test::run_test::haeb3c2bcd5ed6cf6+0x76c (predict_oai-98384e17fb94e863:arm64+0x100047a1c) #3 0x0001007aedb0 in test::console::run_tests_console::he9d142d704f3a986+0x149c (predict_oai-98384e17fb94e863:arm64+0x10002edb0) #4 0x0001007c5758 in test::test_main::hf86a5e20735245b9+0x118 (predict_oai-98384e17fb94e863:arm64+0x100045758) #5 0x0001007c5da0 in test::test_main_static::h61ee9c8fd30abca0+0x54 (predict_oai-98384e17fb94e863:arm64+0x100045da0) ... ==45482==ABORTING

Support device-specific host buffer types in meta backend

Fix all issues from spec review and quality review of the expert prefetch DMA engine (commit f60f94f): Spec fixes: - Req 3: Add design decision comment explaining why hint() is called at MoE dispatch with multi-layer lookahead instead of pre-attention, and why this gives equivalent DMA/compute overlap - Req 5: Implement hit-rate disable loop — when prediction accuracy drops below 30%, set prefetch_disabled_ flag and short-circuit hint() early with log message Critical fix: - ggml-org#1: Deadlock in await() — extract sycl::event copy while holding lock, release lock before blocking event.wait(), re-acquire to update state Important fixes: - ggml-org#2: TOCTOU in hint_batch_adaptive() — hold mutex_ across the entire function so budget snapshot and consumption are atomic - ggml-org#3: has_capacity() counted completed entries — now counts only active (non-completed) in-flight entries - ggml-org#4: gc_completed() safety — add explicit comment tying the safety invariant to the synchronous call chain (ggml_sycl_mul_mat_id -> await -> kernel dispatch -> stream->wait) Minor fixes: - ggml-org#6: Rename PrefetchRequest to prefetch_request (snake_case convention) - ggml-org#7: Log warning when all VRAM pool slots fail, permanently disable - ggml-org#8: Add initialized_ guard at top of hint_batch() - ggml-org#9: Add clarifying comment on n_miss_total <= max_inflight_ check - ggml-org#10: Remove dead using alias expert_prefetcher = ExpertPrefetcher - ggml-org#11: Rename accuracy_total_ to window_total_ for clarity Refactored hint() into hint_locked() internal helper so hint_batch() and hint_batch_adaptive() can hold the lock and call it directly, eliminating recursive locking and TOCTOU races. Co-Authored-By: Claude Opus 4.6 <[email protected]>

* redo: add convert nodes This reverts commit 8448acd. * align clang format with cann * rename binary_op -> general_op casue there're some op that will only tak 1 param * Revert "rename binary_op -> general_op" This reverts commit 5be63b1. * wip * add GGML_OP_PERMUTE * add GGML_OP_VIEW and GGML_OP_GET_ROWS * wip * Revert "wip" This reverts commit 772462c.

Complete experiment log: #1 4-mag LUT: 15.1 at 8K (BEST, +38%) #2 Batched extract: 13.7 (+25%) #3 Inline FA block: 13.5 (I-cache pressure) #4 Deferred norm: 12.9 (loses ILP) ggml-org#5 2-pair half2: 12.0 (ternary overhead) ggml-org#6 Select chain: 11.9 (branches kill) ggml-org#7 Bit-arithmetic: 11.6 (ALU too heavy) ggml-org#8 FMA branchless: 11.4 (ALU still too heavy) ggml-org#9 Named-reg ternary: 10.3 (branches worst) ggml-org#10 Main (8-LUT): 10.95 (baseline) ggml-org#11 Non-vec FA: 10.2 (wrong kernel) Ceiling: 24.5 (no dequant) Apple8 hardware truth: 1 divergent constant read < 7 ALU ops (even with fma) Branches cost MORE than divergent constant reads Array indexing ALWAYS spills on Metal 4 constant addresses is the sweet spot The 4-mag LUT is the dequant-level ceiling on Apple Silicon. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Co-Authored-By: [email protected]

Include Python dependencies in README

029f2b1

ggerganov merged commit 5f2f970 into ggml-org:master Mar 11, 2023

simonw added a commit to simonw/til that referenced this pull request Mar 11, 2023

Update llama-7b-m2.md

ab2620b

This is documented in the LLaMA README now: - ggml-org/llama.cpp#6

SavageShrimp mentioned this pull request Mar 20, 2023

segmentation fault Alpaca #317

Closed

SlyEcho pushed a commit to SlyEcho/llama.cpp that referenced this pull request May 31, 2023

Merge pull request ggml-org#6 from anon998/fix-multibyte

96fa480

Buffer incomplete multibyte characters + other stuff.

windmaple mentioned this pull request Jul 4, 2023

crash when opening the app shixiangcap/llama-jni#1

Open

atopheim mentioned this pull request Sep 7, 2023

Segfault when compiling with make LLAMA_CUBLAS=1 #3054

Closed

4 tasks

ggerganov pushed a commit that referenced this pull request Oct 19, 2023

Merge pull request #6 from damian0815/fssrepo_mac_fixes

9035978

fix compilation errors with llvm

chsasank pushed a commit to chsasank/llama.cpp that referenced this pull request Dec 20, 2023

Update demo in README.md (ggml-org#6)

a81d4a5

* Update demo video in README.md * Update demo at README.md

Dyke-F mentioned this pull request Dec 21, 2023

CUDA error 719 #4563

Closed

3 tasks

nasawyer7 mentioned this pull request Jan 3, 2024

CUDA error: invalid device function when compiling and running for amd gfx 1032 #4762

Closed

segmond mentioned this pull request Jan 14, 2024

train-text-from-scratch oom (in tokenizer?) #4300

Closed

4 tasks

This was referenced Apr 7, 2024

GGML_ASSERT: llama.cpp/ggml-cuda/argsort.cu:48: (ncols & (ncols - 1)) == 0 #6527

Closed

Segmentation fault during IQ3_XS generation. #6597

Closed

micsthepick mentioned this pull request Jul 1, 2024

Bug: GGML assert with bf16, RTX3090 #8234

Closed

ko-alex mentioned this pull request Jul 4, 2024

Bug: gemma 2 27B GGML_ASSERT n_dims <= ne0 #8246

Closed

m828 mentioned this pull request Jul 16, 2024

Bug: ROCm CUDA error #8504

Closed

fan-chao mentioned this pull request Aug 13, 2024

[CANN] Support Q4_0 for Ascend NPU #8822

Merged

4 tasks

slaren mentioned this pull request Aug 15, 2024

Threadpool: take 2 #8672

Merged

4 tasks

znzjugod mentioned this pull request Aug 30, 2024

Bug: A crash occurs when llama-bench is running on multiple cann devices. #9250

Closed

narc1ssus1 mentioned this pull request Jan 23, 2025

Misc. bug: Docker Image llama-quantize Segmentation fault #11196

Closed

ko-alex mentioned this pull request Jan 27, 2025

SIGSEGV during inference #11456

Closed

gaykawadpk mentioned this pull request Feb 12, 2025

Misc. bug: llama-cli crash on ubuntu with GGML-VULKAN=ON #11823

Closed

acbits mentioned this pull request Feb 25, 2025

Regression. Unable to run any model. CRASH!!! #12075

Closed

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

steampunque mentioned this pull request May 4, 2025

Eval bug: b5237 broke Llama Scout #13287

Closed

younesbelkada pushed a commit to younesbelkada/llama.cpp that referenced this pull request May 15, 2025

Merge pull request ggml-org#6 from Eddie-Wang1120/dev-junhuihe

5eb47b7

Fix model architecture name

bjodah mentioned this pull request May 26, 2025

Eval bug: uncaught std::runtime_exception thrown in llama-server during tool use #13812

Closed

Copilot AI mentioned this pull request Dec 29, 2025

Document unresolved code review comments from merged PRs #1 and #2 TheOriginalBytePlayer/llama.cpp#3

Merged

10 tasks

crysolut mentioned this pull request Jan 6, 2026

Eval bug: Segmentation fault qwen3 next rocm/hip gfx906 #17586

Closed

kkaarrss mentioned this pull request Jan 24, 2026

Eval bug: GLM-4.7-Flash flash attention error on (long?) prompts with KV quantization #19036

Closed

sainnhe mentioned this pull request Jan 25, 2026

Eval bug: coredump due to ops of discontinuous tensor memory #19078

Closed

darksylinc mentioned this pull request Jan 26, 2026

Eval bug: Assert in kv-cache using qwen3-vl #19116

Open

pwilkin mentioned this pull request Feb 4, 2026

Misc. bug: Race conditions from CI #19328

Closed

turtle0x1 mentioned this pull request Feb 4, 2026

Misc. bug: Qwen3 coder next server crash #19329

Closed

ToddyTsui mentioned this pull request Feb 5, 2026

Eval bug: llama.cpp server crashed when running QWen3-Coder-Next GGUF model on Ubuntu 24.04 on Strix Halo #19355

Open

odellus mentioned this pull request Feb 7, 2026

Eval bug: Qwen3-Coder-Next crashing #19421

Closed

crysolut mentioned this pull request Feb 8, 2026

Eval bug: ggml_cuda_compute_forward: SOLVE_TRI failed on ROCm 6.4.3 2 X gfx906 GPU 32Gb - on Qwen3-Coder-Next #19442

Closed

jacekpoplawski mentioned this pull request Feb 10, 2026

models : optimizing qwen3next graph #19375

Merged

3 tasks

gaugarg-nv pushed a commit to gaugarg-nv/llama.cpp that referenced this pull request Feb 16, 2026

Merge pull request ggml-org#6 from gaugarg-nv/get_host_buffer_type

f0198ef

Support device-specific host buffer types in meta backend

henry701 mentioned this pull request Feb 19, 2026

Eval bug: CUDA backend crash on GLM-4.7-Flash with FA on and quantized KV cache #19724

Closed

wallentri88 mentioned this pull request Feb 24, 2026

Eval bug: qwen35 and qwen35moe graph split issues (Severe PP impact, crashes) #19864

Closed

steampunque mentioned this pull request Feb 25, 2026

Eval bug: Qwen 3.5 27B crashes running perplexity with RPC #19892

Open

snapo mentioned this pull request Feb 25, 2026

Eval bug: Qwen 3.5 27B GGUF from unsloth hard crash #19906

Closed

MartinEmrich mentioned this pull request Feb 28, 2026

Eval bug: Memory leak? using ROCm #19979

Open

xinye0123 mentioned this pull request Mar 10, 2026

Eval bug: [MUSA] Illegal memory access in SOLVE_TRI on MTT S80 during warmup #20331

Open

feyleth mentioned this pull request Mar 11, 2026

Eval bug: vision model crash #20418

Open

LucianoJBarbosa mentioned this pull request Mar 14, 2026

[ROCm] gfx1010 (RX 5500 XT) HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION on load_all_data #20564

Closed

wronglebowsk mentioned this pull request Mar 16, 2026

Misc. bug: OpenVINO cannot run the operation (CPY) #20619

Open

luke1105 mentioned this pull request Mar 20, 2026

Eval bug: OpenVino: Cant load Qwen3.5 #20562

Open

stew675 mentioned this pull request Mar 25, 2026

Eval bug: PR20908 breaks rpc-server functionality when balancing split a model across multiple machines. #21006

Closed

rubin55 mentioned this pull request Mar 26, 2026

Eval bug: Unresolved Symbol <__memcpy_chk> when running (any?) model #21041

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include Python dependencies in README#6

Include Python dependencies in README#6
ggerganov merged 1 commit intoggml-org:masterfrom
simonw:patch-1

simonw commented Mar 11, 2023 •

edited

Loading

Uh oh!

simonw commented Mar 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simonw commented Mar 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented Mar 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

simonw commented Mar 11, 2023 •

edited

Loading